智能论文笔记

A stabilizing reinforcement learning approach for sampled systems with partially unknown models

Lukas Beckenbach , Pavel Osinenko , Stefan Streif

分类：机器学习

2022-08-31

强化学习通常与奖励最大化（或成本量化）代理的培训相关，换句话说是控制者。它可以使用先验或在线收集的系统数据以无模型或基于模型的方式应用，以培训涉及的参数体系结构。通常，除非通过学习限制或量身定制的培训规则采取特殊措施，否则在线增强学习不能保证闭环稳定性。特别有希望的是通过“经典”控制方法进行增强学习的混合体。在这项工作中，我们建议一种在纯粹的在线学习环境中，即没有离线培训的情况下，可以保证系统控制器闭环的实际稳定性。此外，我们仅假设对系统模型的部分知识。为了达到要求的结果，我们采用经典自适应控制技术。总体控制方案的实施是在数字，采样设置中明确提供的。也就是说，控制器接收系统的状态，并在离散的时间（尤其是等距的时刻）中计算控制动作。该方法在自适应牵引力控制和巡航控制中进行了测试，事实证明，该方法可显着降低成本。

translated by 谷歌翻译

HTML版本

On stabilizing reinforcement learning without Lyapunov functions

Pavel Osinenko , Grigory Yaremenko , Georgiy Malaniya

分类：人工智能 | 机器学习

2022-07-18

强化学习仍然是控制工程和机器学习当代发展的主要方向之一。精美的直觉，灵活的设置，易于应用是此方法的许多好处。从机器学习的角度来看，强化学习代理人的主要优势在于它``捕获''（学习）在给定环境中的最佳行为。通常，代理人是基于神经网络的，正是其近似能力才能使其近似能力引起上述信念。但是，从控制工程的角度来看，强化学习具有严重的缺陷。最重要的是缺乏稳定性的保证，对环境环境的封闭环路封闭循环。旨在稳定增强学习。说到稳定，著名的莱普诺夫理论是事实上的工具。因此，毫无疑问，稳定强化学习的许多技术以一种或另一种方式依赖莱普诺夫理论。在控制理论中，有一个稳定控制器和Lyapunov功能之间的复杂联系。因此，采用这种同对似乎对设计稳定增强l非常有吸引力赚取。但是，Lyapunov函数的计算通常是一个繁琐的过程。在本说明中，我们展示了如何构建根本不采用这种功能的稳定增强学习剂。我们只假设存在Lyapunov功能，如果给定系统（读取：环境）可以稳定，这是自然而然的事情，但是我们不需要计算一个。

translated by 谷歌翻译

On stochastic stabilization via non-smooth control Lyapunov functions

Pavel Osinenko , Grigory Yaremenko , Georgiy Malaniya

分类：机器人

2022-05-26

控制Lyapunov功能是稳定的中心工具。它将抽象的能量函数（lyapunov函数）概括为受控系统的情况。众所周知的事实是，大多数控制的Lyapunov函数都是非平滑的 - 在非全面系统中，例如轮式机器人和汽车也是如此。存在使用非平滑控制Lyapunov功能的稳定框架，例如DINI瞄准和最陡峭的下降。这项工作将相关结果推广到随机情况。作为基础工作，选择了采样控制方案，其中使用系统状态的离散测量在离散时刻计算控制动作。在这样的设置中，应特别注意控制Lyapunov功能的样本对样本行为。这里的一个特殊挑战是在系统上作用的随机噪声。这项工作的核心结果是一个定理，该定理大致指出，如果通常有一个不平滑的控制lyapunov函数，则可以在样本和持续模式下实际稳定给定的随机动力学系统，这意味着控制在抽样时间步骤中保持动作不变。选择的一种特定的控制方法是基于莫罗 - 耶西达的正则化，换句话说是对照lyapunov函数的Inf-consonvolution，但总体框架可扩展到进一步的控制方案。假定，尽管短暂地解决了无限噪声的情况，但几乎肯定会肯定会界定系统噪声。

translated by 谷歌翻译

A comment on stabilizing reinforcement learning

Pavel Osinenko , Georgiy Malaniya , Grigory Yaremenko , Ilya Osokin

分类：机器学习

2021-11-24

这是对纸张“渐近稳定的适应性最优控制算法的简短评论，VAMVoudakis等人的”具有饱和致动器的渐近稳定的自适应 - 最优控制算法“。强化学习（RL）代理人的稳定性问题仍然很难，并且上述工作建议使用来自自适应控制的技术的合适稳定性属性 - 一个旨在添加到行动的强制性术语。但是，这种方法存在稳定RL的方法，我们将在本说明中解释。此外，Vamvoudakis等人。在通用政策下似乎在汉密尔顿时期已经造成了荒谬的假设。为了提供积极的结果，我们不仅会表明这个错误，而且表明了评论在随机连续环境下的批评神经网络权重聚，为行为政策持有提供了某些条件。

translated by 谷歌翻译

A generalized stacked reinforcement learning method for sampled systems

Pavel Osinenko , Dmitrii Dobriborsci , Grigory Yaremenko , Georgiy Malaniya

分类：机器人

2021-08-23

A common setting of reinforcement learning (RL) is a Markov decision process (MDP) in which the environment is a stochastic discrete-time dynamical system. Whereas MDPs are suitable in such applications as video-games or puzzles, physical systems are time-continuous. A general variant of RL is of digital format, where updates of the value (or cost) and policy are performed at discrete moments in time. The agent-environment loop then amounts to a sampled system, whereby sample-and-hold is a specific case. In this paper, we propose and benchmark two RL methods suitable for sampled systems. Specifically, we hybridize model-predictive control (MPC) with critics learning the optimal Q- and value (or cost-to-go) function. Optimality is analyzed and performance comparison is done in an experimental case study with a mobile robot.

translated by 谷歌翻译

FMM-Net: neural network architecture based on the Fast Multipole Method

Daria Sushnikova , Pavel Kharyuk , Ivan Oseledets

分类：人工智能 | 机器学习

2022-12-25

In this paper, we propose a new neural network architecture based on the H2 matrix. Even though networks with H2-inspired architecture already exist, and our approach is designed to reduce memory costs and improve performance by taking into account the sparsity template of the H2 matrix. In numerical comparison with alternative neural networks, including the known H2-based ones, our architecture showed itself as beneficial in terms of performance, memory, and scalability.

translated by 谷歌翻译

Accelerating Barnes-Hut t-SNE Algorithm by Efficient Parallelization on Multi-Core CPUs

Narendra Chaudhary , Alexander Pivovar , Pavel Yakovlev , Andrey Gorshkov , Sanchit Misra

分类：机器学习 | 人工智能

2022-12-22

t-SNE remains one of the most popular embedding techniques for visualizing high-dimensional data. Most standard packages of t-SNE, such as scikit-learn, use the Barnes-Hut t-SNE (BH t-SNE) algorithm for large datasets. However, existing CPU implementations of this algorithm are inefficient. In this work, we accelerate the BH t-SNE on CPUs via cache optimizations, SIMD, parallelizing sequential steps, and improving parallelization of multithreaded steps. Our implementation (Acc-t-SNE) is up to 261x and 4x faster than scikit-learn and the state-of-the-art BH t-SNE implementation from daal4py, respectively, on a 32-core Intel(R) Icelake cloud instance.

translated by 谷歌翻译

Image quality prediction using synthetic and natural codebooks: comparative results

Maxim Koroteev , Kirill Aistov , Valeriy Berezovskiy , Pavel Frolov

分类：计算机视觉

2022-12-20

We investigate a model for image/video quality assessment based on building a set of codevectors representing in a sense some basic properties of images, similar to well-known CORNIA model. We analyze the codebook building method and propose some modifications for it. Also the algorithm is investigated from the point of inference time reduction. Both natural and synthetic images are used for building codebooks and some analysis of synthetic images used for codebooks is provided. It is demonstrated the results on quality assessment may be improves with the use if synthetic images for codebook construction. We also demonstrate regimes of the algorithm in which real time execution on CPU is possible for sufficiently high correlations with mean opinion score (MOS). Various pooling strategies are considered as well as the problem of metric sensitivity to bitrate.

translated by 谷歌翻译

Assign Experiment Variants at Scale in Online Controlled Experiments

Qike Li , Samir Jamkhande , Pavel Kochetkov , Pai Liu

分类：机器学习

2022-12-17

Online controlled experiments (A/B tests) have become the gold standard for learning the impact of new product features in technology companies. Randomization enables the inference of causality from an A/B test. The randomized assignment maps end users to experiment buckets and balances user characteristics between the groups. Therefore, experiments can attribute any outcome differences between the experiment groups to the product feature under experiment. Technology companies run A/B tests at scale -- hundreds if not thousands of A/B tests concurrently, each with millions of users. The large scale poses unique challenges to randomization. First, the randomized assignment must be fast since the experiment service receives hundreds of thousands of queries per second. Second, the variant assignments must be independent between experiments. Third, the assignment must be consistent when users revisit or an experiment enrolls more users. We present a novel assignment algorithm and statistical tests to validate the randomized assignments. Our results demonstrate that not only is this algorithm computationally fast but also satisfies the statistical requirements -- unbiased and independent.

translated by 谷歌翻译

FlexiViT: One Model for All Patch Sizes

Lucas Beyer , Pavel Izmailov , Alexander Kolesnikov , Mathilde Caron , Simon Kornblith , Xiaohua Zhai , Matthias Minderer , Michael Tschannen , Ibrahim Alabdulmohsin , Filip Pavetic

分类：计算机视觉 | 人工智能 | 机器学习

2022-12-15

Vision Transformers convert images to sequences by slicing them into patches. The size of these patches controls a speed/accuracy tradeoff, with smaller patches leading to higher accuracy at greater computational cost, but changing the patch size typically requires retraining the model. In this paper, we demonstrate that simply randomizing the patch size at training time leads to a single set of weights that performs well across a wide range of patch sizes, making it possible to tailor the model to different compute budgets at deployment time. We extensively evaluate the resulting model, which we call FlexiViT, on a wide range of tasks, including classification, image-text retrieval, open-world detection, panoptic segmentation, and semantic segmentation, concluding that it usually matches, and sometimes outperforms, standard ViT models trained at a single patch size in an otherwise identical setup. Hence, FlexiViT training is a simple drop-in improvement for ViT that makes it easy to add compute-adaptive capabilities to most models relying on a ViT backbone architecture. Code and pre-trained models are available at https://github.com/google-research/big_vision

translated by 谷歌翻译